========================================================
Explore and summarize data is a projects that explores red wine, and it contains 1599 observations and 13 variables.
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
First loading the Data and showing the names of the variables “coulomns” and showing the statistical calculation.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
The chart is normally distributed and the mean equals 5.636 , also i summarized the data to make sure the chart is correct.
As we can see the fixed acidity is skewed to the right.
As we can see the volatile acidity is skewed to the right.
As we can see the citrid Acid isn’t normal distributed.
As we can see also the residual sugar is skewed to the right.
As we can see also the chlorides is skewed to the right.
Also the free sulfur dioxide is skewed to the right.
Also the total sulfur dioxide is skewed to the right.
As we can see the density is normally distributed.
Also the PH is normally distributed.
Also the sulphates is skewed to the right.
Also the alcohol is skewed to the right.
the above chart is about the new variable that i created that calculates the quality of the alcohol which is alcohol.Quality whether the level is low or medium or excellent, from the chart it seems that the medium has the highest wine count.
## 'data.frame': 1599 obs. of 14 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## $ alcohol.Quality : Ord.factor w/ 3 levels "low"<"Medium"<..: 2 2 2 2 2 2 2 3 3 2 ...
my data set contains 1599 aboservation and 13 variables + 1 varialbles that i created. so i have 14 variables.
the main feature(s) of my dataset is Quality of the alcohol.
sulphates and PH.
i create alcohol.Quality.
No , there isn’t any unusual distribution
First i will start with scatter plot.
The relationship between the citrid Acid and Fixed Acidity is positive. the higher citrid Acid is the higher fixed acidity.
The relationship between the density and Fixed Acidity is positive. the higher density is the higher fixed acidity.
The relationship between the PH and Fixed Acidity is negative. the lower PH gets the lower fixed acidity.
The relationship between the total sulfur dioxide and free sulfur dioxide is positive. the higher total sulfur dioxide is the higher free sulfur dioxide.
The box plot shows that the wine with the excellent quality has the lowest meadian density.
The box plot shows that the wine with the excellent quality has the highest meadian alcohol.
positive relationship:
citrid Acid and Fixed Acidity density and Fixed Acidity total sulfur dioxide and free sulfur dioxide
negative relationship:
PH and Fixed Acidity
PH and density and residual sugar and density.
density and Fixed Acidity the strongest relationship.
From the previous graphs i see that Alcohol and residual.sugar are important for the quality of the wine.
Alcohol and citric.acid.
the histogram shows that good quality represent 80% from the wine and the chart is normally distributed around 5- 6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The histogram shows that the chart is skewed to the right and the mean value is 10.42
## geom_point: na.rm = FALSE
## stat_summary: fun.data = NULL, fun.y = mean, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE
## position_identity
The boxplot shows that the excellent quality of wine has the lowest meadian density.
in the explore and summarize data project i am working with R language and this is my first time programming using it, it is easy to undurstand but slightly tricky you need to be carefull. overall it was a good experience learning a new things and immeditly working with it within a project like this.
this dataset contains 1599 abservation of 14 variables, step by step i realized that the concentration of the alcohol is related with the quality and the density so i created a new variable called alcoholQuality to measure the levels of the wine quality also i created it beacause i figured that the quality is all about the product which is the wine. i hope i will be more analysing deeply about the quality and the affect of other chemical variables and the statistical in the near future.